Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.
translated by 谷歌翻译
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.
translated by 谷歌翻译
An enhanced geothermal system is essential to provide sustainable and long-term geothermal energy supplies and reduce carbon emissions. Optimal well-control scheme for effective heat extraction and improved heat sweep efficiency plays a significant role in geothermal development. However, the optimization performance of most existing optimization algorithms deteriorates as dimension increases. To solve this issue, a novel surrogate-assisted level-based learning evolutionary search algorithm (SLLES) is proposed for heat extraction optimization of enhanced geothermal system. SLLES consists of classifier-assisted level-based learning pre-screen part and local evolutionary search part. The cooperation of the two parts has realized the balance between the exploration and exploitation during the optimization process. After iteratively sampling from the design space, the robustness and effectiveness of the algorithm are proven to be improved significantly. To the best of our knowledge, the proposed algorithm holds state-of-the-art simulation-involved optimization framework. Comparative experiments have been conducted on benchmark functions, a two-dimensional fractured reservoir and a three-dimensional enhanced geothermal system. The proposed algorithm outperforms other five state-of-the-art surrogate-assisted algorithms on all selected benchmark functions. The results on the two heat extraction cases also demonstrate that SLLES can achieve superior optimization performance compared with traditional evolutionary algorithm and other surrogate-assisted algorithms. This work lays a solid basis for efficient geothermal extraction of enhanced geothermal system and sheds light on the model management strategies of data-driven optimization in the areas of energy exploitation.
translated by 谷歌翻译
Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neural Networks (CNN). This is mainly because the new proposed modules are difficult to converge well from scratch due to lacking inductive bias and easy to focus on the occlusion and noisy areas. TransFER, a representative transformer-based method for FER, alleviates this with multi-branch attention dropping but brings excessive computations. On the contrary, we present two attentive pooling (AP) modules to pool noisy features directly. The AP modules include Attentive Patch Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to emphasize the most discriminative features while reducing the impacts of less relevant features. The proposed APP is employed to select the most informative patches on CNN features, and ATP discards unimportant tokens in ViT. Being simple to implement and without learnable parameters, the APP and ATP intuitively reduce the computational cost while boosting the performance by ONLY pursuing the most discriminative features. Qualitative results demonstrate the motivations and effectiveness of our attentive poolings. Besides, quantitative results on six in-the-wild datasets outperform other state-of-the-art methods.
translated by 谷歌翻译
Monocular depth estimation has been actively studied in fields such as robot vision, autonomous driving, and 3D scene understanding. Given a sequence of color images, unsupervised learning methods based on the framework of Structure-From-Motion (SfM) simultaneously predict depth and camera relative pose. However, dynamically moving objects in the scene violate the static world assumption, resulting in inaccurate depths of dynamic objects. In this work, we propose a new method to address such dynamic object movements through monocular 3D object detection. Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose while leaving the static pixels corresponding to the rigid background to be modeled with camera motion. In this way, the depth of every pixel can be learned via a meaningful geometry model. Besides, objects are detected as cuboids with absolute scale, which is used to eliminate the scale ambiguity problem inherent in monocular vision. Experiments on the KITTI depth dataset show that our method achieves State-of-The-Art performance for depth estimation. Furthermore, joint training of depth, camera motion and object pose also improves monocular 3D object detection performance. To the best of our knowledge, this is the first work that allows a monocular 3D object detection network to be fine-tuned in a self-supervised manner.
translated by 谷歌翻译
With the success of the prompt-tuning paradigm in Natural Language Processing (NLP), various prompt templates have been proposed to further stimulate specific knowledge for serving downstream tasks, e.g., machine translation, text generation, relation extraction, and so on. Existing prompt templates are mainly shared among all training samples with the information of task description. However, training samples are quite diverse. The sharing task description is unable to stimulate the unique task-related information in each training sample, especially for tasks with the finite-label space. To exploit the unique task-related information, we imitate the human decision process which aims to find the contrastive attributes between the objective factual and their potential counterfactuals. Thus, we propose the \textbf{C}ounterfactual \textbf{C}ontrastive \textbf{Prompt}-Tuning (CCPrompt) approach for many-class classification, e.g., relation classification, topic classification, and entity typing. Compared with simple classification tasks, these tasks have more complex finite-label spaces and are more rigorous for prompts. First of all, we prune the finite label space to construct fact-counterfactual pairs. Then, we exploit the contrastive attributes by projecting training instances onto every fact-counterfactual pair. We further set up global prototypes corresponding with all contrastive attributes for selecting valid contrastive attributes as additional tokens in the prompt template. Finally, a simple Siamese representation learning is employed to enhance the robustness of the model. We conduct experiments on relation classification, topic classification, and entity typing tasks in both fully supervised setting and few-shot setting. The results indicate that our model outperforms former baselines.
translated by 谷歌翻译
Vision Transformer已成为计算机视觉中的新范式,表现出出色的性能,同时还具有昂贵的计算成本。图像令牌修剪是VIT压缩的主要方法之一,这是因为相对于令牌数的复杂性是二次的,而许多仅包含背景区域的令牌并不能真正促进最终预测。现有作品要么依赖其他模块来评分单个令牌的重要性,要么为不同的输入实例实施固定比率修剪策略。在这项工作中,我们提出了一个自适应的稀疏令牌修剪框架,成本最低。我们的方法是基于可学习的阈值,并利用多头自我注意力来评估令牌信息,但几乎没有其他操作。具体而言,我们首先提出了廉价的注意力重点加权阶级注意力评分机制。然后,将可学习的参数插入VIT作为阈值,以区分信息令牌和不重要的令牌。通过比较令牌注意分数和阈值,我们可以从层次上丢弃无用的令牌,从而加速推理。可学习的阈值在预算感知培训中进行了优化,以平衡准确性和复杂性,并为不同的输入实例执行相应的修剪配置。广泛的实验证明了我们方法的有效性。例如,我们的方法将DEIT-S的吞吐量提高了50%,并且TOP-1的准确性仅下降了0.2%,这比以前的方法在准确性和延迟之间取得了更好的权衡。
translated by 谷歌翻译
联合学习(FL)是一种机器学习范式,允许分散的客户在不共享其私人数据的情况下进行协作学习。但是,过度的计算和沟通要求对当前的FL框架构成挑战,尤其是在训练大型模型时。为了防止这些问题阻碍FL系统的部署,我们提出了一个轻巧的框架,客户共同学习融合由多个固定预训练的模型生成的表示形式,而不是从SCRATCH培训大型模型。这通过考虑如何从预先训练的模型中捕获更多特定于客户的信息,并共同提高每个客户利用这些现成模型的能力,从而导致我们解决了一个更实用的FL问题。在这项工作中,我们设计了一种联合原型对比度学习(FEDPCL)方法,该方法通过其类原型共享客户的知识,并以原型对比度方式构建特定于客户的表示。共享原型而不是可学习的模型参数可以使每个客户以个性化的方式融合表示表示,同时以紧凑的形式保持共享知识以进行有效的通信。我们在轻量级框架中对拟议的FEDPCL进行了彻底的评估,以测量和可视化其在流行的FL数据集上融合各种预训练模型的能力。
translated by 谷歌翻译
在计算机视觉和摄影测量协会中,Perspective-N-Point(PNP)问题已被广泛研究。随着功能提取技术的开发,单镜头可能会提供大量功能点。有望设计一个一致的估计器,即,随着点的数量增加,估计值可以收敛到真实的摄像头姿势。为此,我们提出了一个一致的PNP求解器,称为\ emph {cpnp},并消除了偏差。具体而言,线性方程是通过原始投影模型通过测量模型修改和可变消除构建的,基于该模型,基于该模型的最小二乘解决方案。然后,我们分析并减去该溶液的渐近偏置,从而产生一致的估计值。此外,执行高斯 - 纽顿(GN)迭代以完善一致的解决方案。我们提出的估计器在计算方面有效 - 它具有$ O(n)$计算复杂性。关于合成数据和真实图像的实验测试表明,就估计精度和计算时间而言,我们提出的估计量优于一些具有密集视觉特征的图像的知名图像。
translated by 谷歌翻译
二进制神经网络(BNNS)对现实世界中嵌入式设备显示出巨大的希望。作为实现强大BNN的关键步骤之一,规模因子计算在减少其实价对应物的性能差距方面起着至关重要的作用。然而,现有的BNN忽略了实价重量和尺度因子的固有双线关系,从而导致训练过程不足引起的亚最佳模型。为了解决这个问题,提出了复发性双线性优化,以通过将固有的双线性变量关联到背面传播过程中,以改善BNNS(RBONN)的学习过程。我们的工作是从双线性角度优化BNN的首次尝试。具体而言,我们采用经常​​性优化和密度 - 列表来依次回溯稀疏的实价过滤器,该过滤器将经过充分的训练并基于可控的学习过程达到其性能限制。我们获得了强大的rbonn,在各种模型和数据集上的最先进的BNN上表现出令人印象深刻的性能。特别是,在对象检测的任务下,rbonn具有出色的概括性能。我们的代码在https://github.com/stevetsui/rbonn上进行开源。
translated by 谷歌翻译